Authorship Attribution in Greek Tweets Using Author's Multilevel N-Gram Profiles

نویسندگان

  • George K. Mikros
  • Kostas Perifanos
چکیده

George K. Mikros Department of Italian Language and Literature National and Kapodistrian University of Athens Email: [email protected] Website: http://users.uoa.gr/~gmikros The aim of this study is to explore authorship attribution methods in Greek tweets. We have developed the first Modern Greek Twitter corpus (GTC) consisted of 12,973 tweets crawled from 10 Greek popular users. We used this corpus in order to study the effectiveness of a specific document representation called Author’s Multilevel N-gram Profile (AMNP) and the impact of different methods on training data construction for the task of authorship attribution. In order to address the above research questions we used GTC to create 4 different datasets which contained merged tweets in texts of different sizes (100, 75, 50 and 25 words). Results were evaluated using authorship attribution accuracy both in 10-fold cross-validation and in an external test set compiled from actual tweets. AMNP representation achieved significant better accuracies than single feature groups across all text sizes. A second experiment was conducted in order to evaluate whether AMNP representation captures better the stylometric profile of the tweets than using separate n-gram profiles. For this reason we repeated the authorship attribution task in the four datasets of varying text size chunks under both testing conditions (10-fold cv and external tweets). The results appear to the following figures (figure 3 & 4).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

N-gram-based Author Profiles for Authorship Attribution

We present a novel method for computer-assisted authorship attribution based on characterlevel n-gram author profiles, which is motivated by an almost-forgotten, pioneering method in 1976. The existing approaches to automated authorship attribution implicitly build author profiles as vectors of feature weights, as language models, or similar. Our approach is based on byte-level n-grams, it is l...

متن کامل

Language Independent Authorship Attribution with Character Level N-Grams

We present a method for computerassisted authorship attribution based on character-level n-gram language models. Our approach is based on simple information theoretic principles, and achieves improved performance across a variety of languages without requiring extensive pre-processing or feature selection. To demonstrate the effectiveness and language independence of our approach, we present ex...

متن کامل

Convolutional Neural Networks for Authorship Attribution of Short Texts

We present a model to perform authorship attribution of tweets using Convolutional Neural Networks (CNNs) over character n-grams. We also present a strategy that improves model interpretability by estimating the importance of input text fragments in the predicted classification. The experimental evaluation shows that text CNNs perform competitively and are able to outperform previous methods.

متن کامل

Enhancing Authorship Attribution By Utilizing Syntax Tree Profiles

The aim of modern authorship attribution approaches is to analyze known authors and to assign authorships to previously unseen and unlabeled text documents based on various features. In this paper we present a novel feature to enhance current attribution methods by analyzing the grammar of authors. To extract the feature, a syntax tree of each sentence of a document is calculated, which is then...

متن کامل

Language Independent Authorship Attribution using Character Level Language Models

We present a method for computerassisted authorship attribution based on character-level -gram language models. Our approach is based on simple information theoretic principles, and achieves improved performance across a variety of languages without requiring extensive pre-processing or feature selection. To demonstrate the effectiveness and language independence of our approach, we present exp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013